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In this work, we propose an adder for the 2D NTC architecture, designed to match the 
architectural constraints of many quantum computing technologies. The chosen archi- 
tecture allows the layout of logical qubits in two dimensions and the concurrent execution 
of one- and two-qubit gates with nearest-neighbor interaction only. The proposed adder 
works in three phases. In the first phase, the first column generates the summation 
output and the other columns do the carry-lookahead operations. In the second phase, 
these intermediate values are propagated from column to column, preparing for compu- 
tation of the final carry for each register position. In the last phase, each column, except 
the first one, generates the summation output using this column-level carry. The depth 
and the number of qubits of the proposed adder are ©(y'n) and O(n), respectively. The 
proposed adder executes faster than the adders designed for the ID NTC architecture 
when the length of the input registers n is larger than 58. 
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1 Introduction 

Quantum computers have been proposed to exploit the exotic properties of quantum mechan- 
ics for information processing. Among many potential uses, two quantum algorithms have 
received the bulk of the attention. One is Shor's large number factoring algorithm pQ, and 
the other is Grover's unstructured database search algorithm [2 , though there has also been 
much progress recently on other algorithms [3, 4, 5 . Quantum algorithms are often shown to 
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2 An ©(-y/n)- depth Quantum Adder on a 2D NTC Quantum Computer Architecture 

be more efficient than classical ones by analyzing the number of queries to an oracle. How- 
ever, for a more exact performance analysis, we need to analyze the quantum algorithms in 
terms of the detailed quantum circuits necessary to implement them. Among many circuits, 
as in classical computation, a core set of subroutines whose behavior will strongly impact the 
performance of the overall algorithm is arithmetic, hence we focus on the adder in this work. 

Numerous quantum addition circuits have been proposed using abstract models of the 
computer itself. The basic elementary quantum arithmetic operations including addition 
have been proposed by Vedral et al. [6, and Beckman et al. [7], following seminal work 
on elementary reversible full- and half-adders by Fredkin and Toffoli jS], and Feynman [pj. 
Glassner proposed an one-qubit full adder [10]. Subsequently, Cheng and Tseng proposed an 
n-qubit full adder and subtractor based on the work of Glassner [11]. Reducing the space 
requirements for those earlier adders [6], Cuccaro et al. proposed a linear-depth ripple-carry 
adder with only a single ancillary qubit [12] . Meanwhile Draper proposed a transform adder 
based on the quantum Fourier transform [13]. Draper et al. proposed a fast quantum carry- 
lookahead adder [14]. Takahashi and Kunihiro have shown that addition can actually be 
performed with no ancillae, at the expense of a deeper circuit [15] , 

Incorporating the behavior of these circuits, we can estimate the overall quantum speedup 
more accurately than simply addressing the issue at the query-level, and confirm again that 
the quantum speedup is very high. However, it is not possible to determine the exact perfor- 
mance gain unless the practical issues of architecture are considered; both the constant factors 
and the leading order of both the computational complexity and minimum execution time (or 
circuit depth) depend on the assumed underlying machine. Hence we have to consider many 
issues such as error correction, communication, gate, and qubit technologies [16) . For exam- 
ple, Maslov et. al |T7] pointed out the importance of the problem of placing circuit variables 
on the underlying qubit layout. Unfortunately, it is impossible to consider all practical issues 
at the same time. To avoid this problem, we usually define a practical quantum computer 
architecture incorporating as many practical constraints as possible. For many quantum com- 
puter architectures, the 2D NTC architecture is a reasonable model capturing the key factors 
that impact performance. NTC allows iVearest-neighbor interactions, Two-qubit quantum 
gates, and Concurrent executions of gates [18] . An example of a potentially scalable archi- 
tecture with the nearest-neighbor constraint is that of Kielpinski et. al [15] . Barenco et al. 
[20) showed one way to decompose a given quantum circuit into two-qubit gates. Steane [21] 
investigated the necessity of concurrent execution for error correction and fault-tolerance; 
concurrency is also required at the application level for high performance. The 2D allows a 
single qubit to interact with four neighboring qubits. With more neighboring qubits than the 
ID case, the 2D layout should show higher performance, thanks to reduced distance between 
many pairs of qubits and the potential for more concurrent movement of qubits. Likewise, 
a 3D layout should show higher performance than the 2D case, but the complexity of fabri- 
cating and controlling qubits in three dimensions likely makes it impractical. Therefore, we 
believe that the 2D layout is the most reasonable choice at the middle level of performance 
and control overhead. Thus, it would be interesting to understand the quantum speedup in 
this context. 

Surprisingly, as far as we know, there is no quantum addition circuit designed specifically 
for the 2D NTC architecture. Hence we have to design a quantum addition circuit for the 2D 
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NTC architecture and estimate the performance gain. Based on this, our contributions are 
as follows. 

• Propose a quantum adder on the 2D NTC architecture. 

First, we lay out the qubits in a s/n x yfn array where n is the input size, in qubits. 
Based on this layout, we propose a three-phase quantum addition algorithm. In the 
first phase, the first column does a ripple-carry addition and the other columns do 
carry-lookahead operations. In the second phase, the column-level carry is propagated 
in ripple fashion between the columns. In the last phase, each column transports its 
column-level carry input into the cells to generate the final summation value. 

• Analyze the proposed adder. 

We decompose the necessary quantum circuit blocks using only one- and two-qubit 
gates. Next, we add SWAP operations necessary to transport qubits in order to satisfy 
the NTC constraint. We found that the depth of the proposed adder is — 90 

in terms of one- and two-qubit gates. Asymptotically, the depth is 0(y / n) meeting the 
depth lower bound we established in earlier work [22] • To execute many quantum gates 
in parallel, the proposed adder utilizes many working qubits as 2n — *Jn qubits. 

Since the 2D NTC layout generalizes the ID NTC architecture, the adders designed for 
the ID NTC architecture can also be implemented on the 2D NTC architecture without 
modification. After reevaluating the depth of the adders for the ID NTC architecture, 
we find that our new 2D adder works faster when n > 58. 

This paper is organized as follows. We explain the addition algorithm, and qubit and 
circuit layouts for the 2D NTC architecture in Section [2l The temporal and spatial resources 
are analyzed in Section [3] Finally, we conclude this work and point out some problems in 
Section |H 

2 Adder on the 2D NTC Structure 

In this section, we first explain how the qubits are laid out on the 2D structure firstly Second, 
we explain an addition algorithm based on a slight modification of carry-lookahead addition. 
Third, we discuss how the addition algorithm is mapped with the circuit blocks. Finally, we 
show how the ancillae qubits can be initialized. 

2.1 Qubit Layout 

On the 2D NTC structure, we can lay out the qubits as shown in Figure [TJ In the figure, the 
two input registers are A = a n ■ 2 n_1 + a„_i • 2"~ 2 + • ■ ■ + Oi and B = b n - 2 n ~ x + b n -i ■ 2 n ~ 2 + 
• ■ ■ + b%. As shown in the figure, the two inputs ai and bi are interleaved where 1 < i < n. 
The number of rows and columns are 2^/n and y/n, respectively. Two inputs dj and bi are 
located at a (fe-th column, j-th row) cell where k = \i/y/n\ and j = i — (k— l)\/n. The figure 
shows only the input qubits for clarity. For simplicity, we assume without loss of generality 
that y/n is an integer. 

2.2 Adapting Carry-Lookahead Addition to Limited Interaction Distance 

To set the stage for the later arithmetic discussions, let us first explain the ripple for two n- 
qubit input registers, a and b. Since the summation value for the i-th position Si is generated 
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Fig. 1. Layout of Input Qubits for a 2D NTC adder. 
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i-th qubit is located at (k,j) position where k = \i/ \/n\ and j = i — (k — l)^/n. Ancillac qubits 
are not shown for simplicity. 



by cii © bi © Ci, where and bi are the i-th qubits in the input registers, and c, is the carry 
input from the summation of the (i — l)-th position, the time complexity of the addition 
depends on how fast the carry information can be transported between the bit positions. 

The simplest circuit is the ripple carry adder, which propagates the carry information 
stepwise from position to position. The carry output for the (i + l)-th position, c i+ i, should 
be one if a majority of the bits a^, bi, and Cj are one, and zero otherwise; it is generated by 
a, • bi © en ■ Ci © bi ■ Ci. Therefore, the final summation value s„ is generated only after n ripple 
carry time steps. 

To reduce this time, a carry-lookahead method was devised. In this method, two additional 
values are defined as follows: 

9i = a-i ■ ^. (1) 
Pi = a t ®h. (2) 

Implicitly, gi and pi determine whether this bit position generates a carry out independent 
of the carry in, or propagates its incoming carry to its output carry, respectively. Only one of 
these may be true, though both may be false (called carry kill, though kill is not necessary in 
the actual circuit). The carry output for (z + l)-th position is generated as Cj = gi © pi ■ Ci-\. 
Therefore, if gi is one, c, has no dependence on Cj_i, and hence disconnects the carry chain. 
However, if gi is zero, Cj is dependent on Cj_i. In the worst case, the longest chain is from ci 
to c„. To decompose this long chain into sub-units, two variables G[i, j] and P[i, j] are also 
defined as follows. 



G[i,j] - 9j®Pj-G[i,j-l]. 
P[i,j] - /',•/•'../ I. 



(3) 
(4) 
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G[i,j] indicates whether an entire span of the addition, from qubit i to qubit j, generates a 
carry. Similarly, P[i,j] indicates whether the span propagates the carry from position i 
all the way to position j. By calculating these values concurrently and progressively increasing 
the span of G and P, the total time to create complete carry information for the entire register 
can be reduced to O(logn), provided that communication within the system is adequately 
fast. 

Unfortunately, this carry-lookahead addition algorithm is defined assuming no limitation 
of interaction distance, and hence cannot be applied for the 2D NTC architecture without 
modification. In this work, we slightly modify the carry-lookahead, which consists of three 
phases as follows. 

2.2.1 Phase 1: Ripple Carry Addition on the First Column, and Carry-Lookahead on the 
Other Columns 

As shown in Figure [2j the first column does the typical ripple carry addition. From the first 
position to the last position, each position generates a summation value and a carry output 
as follows. 

Si = Oi © hi © Ci, (5) 
Ci+x = a l -b i ®a t -Ci®b t -c i , (6) 

where c\ = 0. Since the carry output of the i-th position must be used as input for the next 
(i + l)-th position, there is an information dependency, hence this step takes about 0(\/n) 
time. 

During this time, the other columns concurrently generate other necessary information for 
carry-lookahead operations. For example, the fc-th column works as follows. First, each (k, j) 
cell generates g (fc _ 1)v ^ +:) and P( k -i)^+ 3 concurrently, 

5(fe-l) v ^T+j = d(k-l)Vn+j ' b (k-l)Vn~+ji ( 7 ) 
P(k-l)Vn+j = a (k-l)Vn+j ® fy>-l)Vn+J' (°) 

where 1 < j < y/n. After that, each (fc, j) cell generates G[(k — l)\fn + 1, (k — l)-y/n + j] and 
P[{k — l)\/n + 1, (k — {)\/n + j] sequentially, 

G[(k-l)y/K+l,(k-l)y/K + j] = g {k _ 1)V z +j ® (9) 

P(k-i)^+ 3 ■ G[{k - l)Vn+ 1, (k - l)Vn + j - 1], 

p[(k-i)V^ + i,(k-i)V^ + j] = p (fe _ 1)v ^ +r P[(fc-i)V^ + i,(fc-i)V^ + j-^o) 

where G[(k -l)«/n + l,(k- + 1] = 9( k -i)V^+i and P \i k ~ ^)Vn + h(k - l)*Jn + 1] = 
-P(fc-i)v'n+i- ^ ne same process is applied for the other columns. 

After this phase, the first column generates its final summation output and also the carry 
output c y ^- +1 . The other columns generate the column-level carry-lookahead information 
G[(k - l)v^ + 1, k^/n\ and P[(k -1)^+1, k^/n\. 

2.2.2 Phase 2: Inter- Column Carry Propagation 

The final carry output of the first column, c^ +1 , is given as an initial input value for the 
column-level carry generation logic as shown in Figure [3] Each column, except the first, 
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Fig. 2. First phase. 

During this phase, the first column executes a ripple-carry adder. The other fc-th column generates 
ff(fc-l)Vn+i and P(fc-l)v^+i concurrently, and then G[(k - l)^/n + 1, (fc - l)Vn + j] and P[(fc - 
l)-y/n + 1, (fc — l)v / "' + i] sequentially. 



generates its column-level carry output as follows. 
Column_carry k = c k ^ +1 = G[(k- 1) y/n+1, fcv / "]ffic (fc „ 1)v ^ :+1 -P[(fc- l)y/n + l, ky/n\. (11) 

£.2. 3 Phase 3: Carry Generation and Summation 

After the first phase, each (k,j) cell has the carry-lookahead information G[(k— l)y / n + l, (fc — 
l)\/n + j] and P[(fc — 1)V^ + 1; — 1)V™ + .?']• After the second phase, each column has 
the incoming column-level carry c.{k-i) ^fR+i- By propagating incoming column-level carry as 
shown in Figure 01 each (fc,j) cell can calculate its final carry input as 

<k = c (k _ 1)y ft +j = G[(fc-l)^+l,(fc-l)V"+j] (12) 
©C( fc -i)^+i • P[(k - l)Vn + 1, (fc - l)Vn + j]. 

After that, each cell can generate the final summation value as 

s i = s (k-l)^n+j — a (k-l)^/n+j ® ^(k-l)^n+j © c (k-l)^n+j ■ (13) 



2.3 Circuit Layout 

In the first phase, the first column and the other columns use different circuit blocks. The 



circuit blocks for the first column are shown in Figure 5(a) To do the ripple carry addition 



a half-adder (HA) for the first position and y/n-1 full-adders (FA) are used. The circuit 



blocks for the other columns are shown in Figure 5(b) As explained in the previous part, it 



generates first 9(k-i)^j7L+j an d P(k-i)^/n+j concurrently by using the g, p circuit blocks and 
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Fig. 3. Second phase. 

The purpose of this phase is to generate column-level carry output for each column sequentially. 



then G[(k — l)\/n + 1, (k — l)y/n + j] and P[(k — l)^/ri + 1, (k — 1)%/"- + j] sequentially by 
using the G, P circuit blocks. 

The block-level circuit for the second phase is shown in Figure [6] The circuit block Col- 
carry has three inputs: G and P from the corresponding column and C olumrucarry from 
the lower column. 

Figure [7] shows the circuit blocks for the third phase. In the figure, c and cl represent 
the blocks for generating carry output for i-th position. Note for the first row, p and g are 
the same as P and G, and hence the circuit block is slightly different. SUM, SUM1, and 
SUM2 arc for generating the final summation value for j-th position. 

2-4 Clearing Ancillae Qubits 

As shown in Table [5J three types of ancillae qubits are used, Ci, P[i,j], and Columrijcarryk- 
To clean these ancillae, we have used the strategy proposed in Reference |14j . The key idea 
of this approach is based on the observation that in two's complement arithmetic 



-x = x + 1 (mod 2"), 
x + x = -1 (mod 2"), 
-x-1 = x (mod 2"), 



(14) 
(15) 
(16) 



where x is the bit-wise inversion of x. Let us consider an addition of A and B, ADD(A, B, 0) = 
(A, S, C), where 5* and C are the bitwise sum and carry vectors, respectively. Let us consider 
another addition of A and S, ADD(A, S, 0) = (A, B, D), where B and D are sum and carry 
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First, transport Column_carry 
Second, generate c i for each position 
Third, generate s i for each position 




Fig. 4. Third phase. 

Using the incoming carry for each column, all carry and sum are generated sequentially. 



vectors, respectively. Note the bitwise sum is B because 

A + S = A- (A + B + 1) = -B -1 = B. (17) 
It is worth noting that C must be equal to D because of 
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Now we follow the circuit as shown in Figure [8] Conceptually any addition circuit can be 
divided into two parts, CARRY generation (Ci)and SUM generation (Si). As shown in the 
figure, we apply CARRY as follows. 

CARRY(A, B, 0) => (A, A® B,C). (24) 

As the second step, we apply SUM as follows. 

SUM (A, A® B,C) => (A, A® B ® C, C*) = (A, S, C). (25) 

Apply two operations 

NOT 2 (A, S, C) =► (A,S,C). (26) 
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(b) Other Columns 



(a) First Column 
Fig. 5. Circuit flow for the first phase. 

Note FA and HA arc the full adder and the half adder, respectively, s and c are initially |0), and 
Si and a are summation and carry for each position, respectively. 



CNOT h2 (A,S,C)=> (A,A®S,C). (27) 

Meanwhile, 

CARRY (A, S, 0) => (A, A® S, D). (28) 

Since the two carry vectors C and D for A + B and A + S are the same, the above line changes 
to 

CARRY(A,S,0) =4> (A,A@S,C). (29) 
Therefore, running the inverse operation, 

CARRY-\A,A®S,C) =^> (A,S,0). (30) 

Finally, apply NOT 2 as follows. 

NOT 2 (A,S,C) => (A, 5,0), (31) 

to generate the final sum and clean ancillae. 

3 Analysis 

3.1 Depth Analysis 

To analyze the depth of the proposed adder, we have to decompose the circuit blocks into 
elementary gates, which can be decomposed into unit delay gates. In this work, we assume 
one-qubit, CNOT, and Control- v NOT gates have unit delay. The elementary gates we 
have chosen for constructing our circuits are SWAP, CCNOT, CNOT, Control-ViVOT, 
and one-qubit gates. In this paper, we use the three-CNOT construction for SWAP gate. 
Figure IH1 shows the conventional form of CCNOT (left) and its decomposition into one-qubit 
and two-qubit gates (right). 

3.1.1 Circuit Decomposition with NTC Constraints 

Now we decompose the circuit blocks for three phases with the chosen elementary gates and 
necessary SWAP operations to satisfy the NTC constraints. The blocks are shown in Figures 
[10]to[l! The circuit of HALF ADDER is shown in Figure |10(a)j Figure |10(b)1 [IT] shows a 
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- Column _ carry ^ - 



Column _ carry — > Final Carry 



Fig. 6. Circuit flow for the second phase. 

Col-carry block generates a column-level carry output, which is used for the actual incoming 
carry value for the next (right) column. 



decomposition of FULL ADDER into elementary gates. In this figure, there is no limitation 
on the distance between operands for a gate. To satisfy the NTC constraint, we redesign it 



as shown in Figure 10(c) by adding several SWAP gates to move the qubits to neighboring 
positions. This approach is also applied for the following circuit blocks. The circuits for g and 
p, and the generalized G and P are shown in Figure QTJ The circuit of Column_carry is 
shown in Figure IT2"1 For generating \ColjCarryk+i), a single CCNOT is enough. However, 
to propagate it to the next column and to propagate \ColJJarryi-) to the rows, a SWAP 
is necessary. For implementing the last SWAP gate in the neighbor interaction only case, 



several SWAPs are necessary as shown in Figure 12(b) The initial circuit for Carry is shown 



in Figure 13(a) Since the Col-carry has to be moved to the upper row, several SWAPs are 
necessary as shown in Figure |13(b)| After this circuit, the Col -carry is transported to the 
top position, and the others are to the lower row. Since the carry for the first row is different 
from other rows, Figures 13(c) and 1 13 (d")] show its circuits. The circuits for SUM are shown 
in Figures 14(a) and 14(b) For the second and the first row, we have to use slightly different 
circuits as shown in Figures 14(c) and |14(d)| and Figure 14(e) respectively. 



3.1.2 Total Depth 

Based on the revised circuits with satisfying the NTC constraint, we can summarize the depth 
of each elementary gate and circuit block as shown in Table [1] 

The proposed adder works in three sequential phases, and hence the overall depth is the 
sum of the depths for each phase. The depth for each phase is the "long pole" , or the longest 
delay among the parallel execution paths. In the first column, one HA and (y/n — 1) FA 
operations are executed sequentially. Since HA needs 10 unit-gate steps and FA needs 26 
unit-gate steps, 26^/n— 16 unit-gate steps are needed. On the other hand, the other columns 
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Fig. 7. Circuit flow for the third phase. 

c represents the block for generating carry output for i-th position. SUM is for generating the 
final summation value for i-th position. 



need one g, p + (y / ?i-1)G,P, which is 36y/n — 26. The overall depth for the first phase is 
the longer of the two column types, hence — 26. The second phase consists of (y/n — 1) 

Column_carry operations, requiring a total of 18y/n— 18 time steps. The third phase consists 
of (y/n— 1) Carry + Carryl and SUM1 operations for the longest path. Hence, the depth 
is 21-^n + 1 unit-gate steps. By summing depths of each phase, the total depth is 7h\pn — 43. 

The above depth is only for generating the summation output without clearing the ancillae. 
For clearing ancillae, we apply more circuits as shown in Figure [5J Based on this figure, we 
can decompose the above three phases into the carry generation flow and the sum generation 
flow. The first and the second phases are for the carry generation flow. The third phase has 
to be divided into the carry generation flow and the sum generation flow. The above depth 
is apportioned as Ihypri — 50 for carry generation flow and 7 for sum generation flow. As 
shown in Figure [8] we need to apply NOT and CNOT gates and then inverse of the carry 
generation flow again with the final NOT gate. Hence, the overall depth is Ihyfn — 50 + 7 + 
1 + 1 + 75V^ - 50 + 1 = 150^ - 90. 

3.2 Required Space 

The number of qubits for the adder is shown in Tabled As shown in the first column, some 
qubits work for multiple purposes. Note the additional number of qubits is 2n — y/n, which 
is less than twice the minimum 2n qubits [121 115) . 
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Fig. 8. Clearing Ancillae Qubits. 

By applying the inverse of the carry generation flow, the ancillae qubits can be cleaned. 
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Fig. 9. Circuit for CCNOT 



3.3 Comparison to Other Adders 

When only interactions between neighboring qubits are allowed, the depth of arithmetic cir- 
cuits increases. For the 2D case, the depth lower bound was proven to be Vl (y/n) [22]. There- 
fore, the depth of the proposed adder is asymptotically optimal. 

Beyond the asymptotic behavior, it seems more interesting and important to compare with 
other adders in the practical cases. Specifically, it is necessary to compare adders designed 
for the ID NTC architecture since they can be implemented on the 2D NTC architecture 
without modification, using a simple serpentine qubit layout. The overall analysis and the 
comparison between the adders are shown in Table [3] The first column distinguishes the 
architecture and the second column lists the adder type. For the ID NTC architecture, we 
choose three typical adders. Vedral et al. proposed a plain ripple-carry adder [5J, named VBE 
in the table. VBE-Improved is the Van Meter and Itoh update to this adder [T5] . Cuccaro et 
al. proposed a ripple carry adder with only one ancillae qubit [H] , named CDKM. For the 2D 
NTC architecture, the present adder is shown. For the architecture with arbitrary distance 
interaction, several adders are evaluated. Draper proposed a quantum Fourier transform adder 
[13] . named QFT-based. By exploiting the classical fast addition algorithm, Draper et al. also 
proposed a carry-lookahead adder [14] , named CLA-based. Kawata et al. also proposed an 
adder based on the combination of ripple carry adder and carry-lookahead adder [24] , named 
RC A + CLA-based. For comparison, the depth and the size of each adder is shown in the third 
column. In this work, the depth is measured by in units of one- and two-qubit gates for 
the ID and 2D NTC architectures. On the other hand, the depth for the AC architecture 
is based on one-, two-, and CCNOT gates. The size is for the number of qubits for input, 
output, and ancillae. In the fourth column, the input size is shown when the selected adder 
works faster than the present adder. In the fifth column, we calculate KQ, the product of 
qubits and depth where K and Q arc the numbers of logical qubits and computational steps, 
respectively [23]. KQ is used to estimate the strength of error correction required. 

From this table we can point out three key results. First, when the size of input is larger 
than 58, the present adder works faster than ID NTC adders. Second, the present adder 
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Fig. 10. Circuit for HALF ADDER (a); Circuits for FULL ADDER with arbitrary interaction 
(b) and with only nearest-neighbor interaction (c) 
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Fig. 11. Circuits for g,p (a); Circuits for G and P with arbitrary interaction (b) and with only 
nearest-neighbor interaction (c) 



needs about two times number of qubits than ID NTC adders. Lastly, the present adder has 
a smaller KQ factor when the input size is larger than 278. 

4 Conclusion and Open Problems 

In this work, we proposed a quantum adder for the 2D NTC architecture for the first time. 
Van Meter and Oskin indicated that an adder would be in 0(y/ri) time complexity on a 
2D architecture, but no circuit has been provided |25j . The proposed adder has the depth 
complexity 0(y / n) with 0(n) qubits. We found that the proposed adder works faster than a 
ID ripple-carry adder when the length of the input registers is larger than 58, and requires 
about two times the number of additional qubits. 

Although this adder is, to the best of our knowledge, the first one specifically designed for a 
2D architecture, we suspect it will not be the last; we anticipate that several improvements are 
possible. First, the number of additional gates is very large. Most of the gates for the proposed 
adder are used for transporting qubits to neighboring positions so that gates can be executed. 
By arranging qubits in a better way, we may be able to reduce the necessary propagattion 
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Fig. 12. Circuits for Columnxarry with arbitrary interaction (a) and with only nearest-neighbor 
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Fig. 13. Circuits for Carry with arbitrary interaction (a) and with only nearest-neighbor in- 
teraction (b); Circuits for Carry 1 with arbitrary interaction (c) and with only nearest- neighbor 
interaction (d) 



operations. Second, the phase for cleaning the ancillae qubits roughly doubles the total 
number of quantum operations. In the present adder, the ancillae qubits arc reinitialized by- 
applying the inverse circuit, doubling the overall depth. Perhaps there is some way to reduce 
this drawback by exploiting some overlap of the clearing phase with the computation phase. 
Third, the number of ancillae is also very large. The proposed design attempts to achieve 
the highest parallel execution at the expense of requiring more ancillae, but this tradeoff 
may prove to be less than optimal for two reasons. First, qubits themselves are expensive 
resources, and in many applications could be allocated to other work if not used directly in 
the adder; second, inserting the ancillae into our layout increases the distance between qubits, 
forcing the addition of more SWAPs and slowing down the circuit. 
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Table 1. Depth analysis of each gate and circuit 
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Composition of the longest path 


# of unit-gate steps 


SWAP 


3 CNOTs 




3 


CCNOT 


1 SWAP + 


6 unit gates 


9 


HALF ADDER 


1 CCNOT 


f 1 CNOT 


10 


FULL ADDER 


2 CCNOTs 


+ 2 CNOTs + 2 SWAPs 


26 


g and p 


1 CCNOT 


f 1 CNOT 


10 


G and P 


2 CCNOTs 


+ 6 SWAPs 


36 


Column_carry 


1 CCNOT 


f 3 SWAPs 


18 


Carry 


1 CCNOT 


f 4 SWAPs 


21 


Carryl 


1 CCNOT 


f 2 SWAPs 


15 


SUM 


1 CNOT + 


4 SWAPs 


13 
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2 SWAPs 


7 
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1 CNOT 
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Table 2. Number of Qubits 



Name 


Number of qubits 


Explanation 


<Xj 


n 


Input A 


h -» Pi -» «i 


n 


Input B, Carry propagate for i-th position, and Summation S 


|0> -> Si -> G[i,i] ->• Q 


n 


Carry generation for i-th position, Carry generation between i and j, and 
carry for i-th position 


|0> -> P[i,iJ 


n - 2^ + 1 


Carry propagation between % and j 


C olumn_carry k 




Inter column carry. The last C olumnjcarry is for the final carry output. 


Total 


(2n + l)+(2n - ^/n) 


Mandatory + Additional 



Table 3. Comparison with Other Designs 



Architecture 


Name of Adder 


(Depth, Number of Qubits) 


When is the present 
adder faster than the 
corresponding adder? 


KQI22] 


ID NTC 


VBE 6 


(76n - 30, 3n + 1) 


n > 4 


228n^ - O(n) 


VBE-Improvcd 18 
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CDKM 12 


(18n + 14, 2n + 2) 


n > 58 


36r^ + O(n) 


2D NTC 


Present Adder 


(150Vn - 90, An - + 1) 




600n-v/n - 0(n) 


AC 


QFT-based 13 


(31ogn, 2n + l) 


N/A 


6n log n + O (log n) 


CLA-based 14 


(21ogn + 2, 4n-logn) 


N/A 


8n log n + O(n) 
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(10 logn + 6n/log n, n + 4n/logn) 


N/A 


lOn log n + O {n^ / log n) 
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